Gender Identification in Russian Texts
نویسندگان
چکیده
Gender Identification is a task where we have to identify the gender of the author for written texts. An hybrid approach has been designed by combining deep neural network and a rule-based classifier for russian texts. LSTM and BiLSTM have been used as a part of Neural Network due to their capability to learn long-term dependencies.
منابع مشابه
Gender Prediction for Authors of Russian Texts Using Regression And Classification Techniques
Automatic extraction of information about authors of texts (gender, age, psychological type, etc.) based on the analysis of linguistic parameters has gained a particular significance as there are more online texts whose authors either avoid providing any personal data or make it intentionally deceptive despite of it being of practical importance in marketing, forensics, sociology. These studies...
متن کاملCross-genre Gender Identification in Russian Texts Using Topic Modeling Working Note: Team DUBL
In this paper, we describe the results of gender identification from Team DUBL. We used a topic modeling approach for identifying the author’s gender based on his/her written texts. The model was trained on the RusProfiling PAN 2017 Twitter Corpus that contains data in the Russian language. Themodel has been evaluated on texts of other genres, including texts such as letters to a friend, online...
متن کاملOverview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian
Author profiling consists of predicting some author’s traits (e.g. age, gender, personality) from her writing. After addressing at PAN@CLEF mainly age and gender identification, in this RusProfiling PAN@FIRE track we have addressed the problem of predicting author’s gender in Russian from a cross-genre perspective: given a training set on Twitter, the systems have been evaluated on five differe...
متن کاملThe Winning Approach to Cross-Genre Gender Identification in Russian at RUSProfiling 2017
We present the CIC systems submitted to the 2017 PAN shared task on Cross-Genre Gender Identification in Russian texts (RUSProfiling). We submitted five systems. One of them was based on a statistical approach using only lexical features, and other four on machine-learning techniques using some combinations of genderspecific Russian grammatical features, word and character n-grams, and suffix n...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کامل